Automatically Classifying Sentences in Full-Text Biomedical Articles into Introduction, Methods, Results and Discussion

نویسندگان

  • Shashank Agarwal
  • Hong Yu
چکیده

BIOMEDICAL TEXTS CAN BE TYPICALLY REPRESENTED BY FOUR RHETORICAL CATEGORIES: introduction, methods, results and discussion (IMRAD). Classifying sentences into these categories can benefit many other text-mining tasks. Although many studies have applied approaches to automatically classify sentences in MEDLINE abstracts into the IMRAD categories, few have explored the classification of sentences that appear in full-text biomedical articles. We explored different approaches to automatically classify a sentence in a full-text biomedical article into the IMRAD categories. Our best system is a support vector machine classifier that achieved 81.30% accuracy, which is significantly higher than baseline systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accessing bioscience images from abstract sentences

Images (e.g., figures) are important experimental results that are typically reported in bioscience full-text articles. Biologists need to access images to validate research facts and to formulate or to test novel research hypotheses. On the other hand, biologists live in an age of information explosion. As thousands of biomedical articles are published every day, systems that help biologists e...

متن کامل

Automatic segmentation of subfigure image panels for multimodal biomedical document retrieval

Biomedical images are often referenced for clinical decision support (CDS), educational purposes, and research. They appear in specialized databases or in biomedical publications and are not meaningfully retrievable using primarily text-based retrieval systems. The task of automatically finding the images in an article that are most useful for the purpose of determining relevance to a clinical ...

متن کامل

Accessing Full Text of Articles: A Study on the Status of Medical Universities in Tehran

Introduction. Due to the rapid development of information technology and world wide web, there is easy and fast access to medical information and medical journals. Although there is free and easy access to articles' abstracts through Medline on the internet, accessing full text articles still remains a problem. This study was carried out to investigate the best way we could access full text of ...

متن کامل

Gene Ontology Evidence Sentence Retrieval Using Combinatorial Applications of Semantic Class and Rule Patterns

Gene Ontology (GO) provides helpful information with respect to biological process, molecular function and cellular component in annotating the relationships among gene, chemical and disease. Due to the complexity of GO knowledge, developing automated or semi-automated GO curation techniques remains to be a big challenge for database curators. In order to efficiently and precisely retrieve GO i...

متن کامل

Using Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents

Text Classification is an important research field in information retrieval and text mining. The main task in text classification is to assign text documents in predefined categories based on documents’ contents and labeled-training samples. Since word detection is a difficult and time consuming task in Persian language, Bayesian text classifier is an appropriate approach to deal with different...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 2009  شماره 

صفحات  -

تاریخ انتشار 2009